Building Subjectivity Lexicon(s) from Scratch for Essay Data

نویسندگان

  • Beata Beigman Klebanov
  • Jill Burstein
  • Nitin Madnani
  • Adam Faulkner
  • Joel R. Tetreault
چکیده

While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon(s) for recognizing sentiment polarity in essays written by test-takers, to be used within a commercial essay-scoring system. We discuss ways of expanding a manually-built seed lexicon using dictionary-based, distributional indomain and out-of-domain information, as well as using Amazon Mechanical Turk to help “clean up” the expansions. We show the feasibility of constructing a family of subjectivity lexicons from scratch using a combination of methods to attain competitive performance with state-of-art research-only lexicons. Furthermore, this is the first use, to our knowledge, of a paraphrase generation system for expanding a subjectivity lexicon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Pivot-Based Paraphrasing and Sentiment Profiles to Improve a Subjectivity Lexicon for Essay Data

We demonstrate a method of improving a seed sentiment lexicon developed on essay data by using a pivot-based paraphrasing system for lexical expansion coupled with sentiment profile enrichment using crowdsourcing. Profile enrichment alone yields up to 15% improvement in the accuracy of the seed lexicon on 3way sentence-level sentiment polarity classification of essay data. Using lexical expansi...

متن کامل

Czech Subjectivity Lexicon: A Lexical Resource for Czech Polarity Classification

This paper introduces Czech subjectivity lexicon – the new lexical resource for sentiment analysis in Czech. The lexicon is a dictionary of 4947 evaluative items annotated with part of speech and tagged with positive or negative polarity. We describe the method for building the basic vocabulary and the criteria for its manual refinement. Also, we suggest possible enrichment of the fundamental l...

متن کامل

Building a fine-grained subjectivity lexicon from a web corpus

In this paper we propose a method to build fine-grained subjectivity lexicons including nouns, verbs and adjectives. The method, which is applied for Dutch, is based on the comparison of word frequencies of three corpora: Wikipedia, News and News comments. Comparison of the corpora is carried out with two measures: log-likelihood ratio and a percentage difference calculation. The first step of ...

متن کامل

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources

This paper introduces a method for creating a subjectivity lexicon for languages with scarce resources. The method is able to build a subjectivity lexicon by using a small seed set of subjective words, an online dictionary, and a small raw corpus, coupled with a bootstrapping process that ranks new candidate words based on a similarity measure. Experiments performed with a rule-based sentence l...

متن کامل

A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining

Most opinion mining methods in English rely successfully on sentiment lexicons, such as English SentiWordnet (ESWN). While there have been efforts towards building Arabic sentiment lexicons, they suffer from many deficiencies: limited size, unclear usability plan given Arabic’s rich morphology, or nonavailability publicly. In this paper, we address all of these issues and produce the first publ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012